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Abstract: Rhetorical Structure Theory (RST) has been in 
constant research for a number of years. The different parts 
of a text are called text spans. RST helps in organizing these 
text spans. These text spans are connected with each other by 
Discourse markers. In this paper, we are exploring the 
possibility of realizing some RST relations (CONTRAST, 
SEQUENCE and PARALLEL) from multi-nuclear sentences 
in Bengali Language with the help of the semantic structure 
of the sentences and the discourse markers. We present a rule 
based approach for comprehending the RST relations from 
the discourse markers in Bengali Language. 

Keywords - RST, Discourse Marker, Compound Sentence, 
Syntactic Aggregation, Semantic Representation. 

I. Introduction 

Rhetorical Structure Theory is a theory of organization of the 
text spans, the working of those text spans and how those 
text spans are related and connected to each other [2]. Each 
text span can have one or two roles in a relation: it can be a 
nucleus or a satellite. Nucleus represents the central unit in 
the sentence, they exist independently. They are considered 
essential to the understanding of the text. Satellites is less 
central and can never exist independently, they are actually 
dependent on the nucleus. Satellites contribute additional 
information to the nucleus [3]. 

Discourse marker is a word or phrase that helps in 
logically connecting two text spans which are coherent to 
each other [2]. Examples of Discourse Markers are:- but, then, 
and, although, yet, with, etc 

In this paper, we are dealing with only compound 
sentences, i.e. sentences having more than one nucleus. 
Compound sentences are those sentences which are formed 
by combining two or more simple sentences having one 
nucleus. Some example are given as:- (1) CONTRAST: - Ram 
is very happy with his marks but his sister is very upset with 
her marks. Here the discourse marker is "but" and the RST 
relation is CONTRAST. 

(2) SEQUENCE:- Bumba is going to the shop then he will go 
to visit his grandmother. 

Here the discourse marker is "then" and the RST relation is 
SEQUENCE. 

(3) PARALLEL:- Dipshika is eating while watching the 
television. 

Here the discourse marker is "while" and the RST relation is 
PARALLEL. 

There are many more like EXPRESSION, CONJUNCTION, 
CONTINUATION, LIST, etc. 
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A. Objective 

The work in this paper has been done in Bengali 
Language. Since the formation of Bengali Language is free 
order, so Ambiguity occurs in mapping of Discourse markers 
to the RST relations. Some Discourse Marker can be mapped 
to more than one RST relation hence there exist some 
ambiguity in mapping function. So to resolve this ambiguity 
we have tried to formulate some algorithms for each of the 
RST relations. For a given Semantic representation of 
compound sentence, we can identify the RST relation of a 
sentence with utmost accuracy. 
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Figure 1: Inputs of the function 

The Figure 1 explains the system we are building. Our 
formulated algorithm or function is going to receive the 
semantic representations of simple sentence 1 and simple 
sentence 2 along with the discourse marker and is going to 
give the corresponding RST relation as the output. So our 
objective is to identify the RST relation from discourse marker 
in Bengali Language. 

II. Related Works 

One of the well known tasks in Discourse marker and 
RST has been done by Maite Taboada(Discourse markers 
as signals (or not) of rhetorical relations) [l],where he 
discusses the idea of discourse markers or connectives 
signaling in the presence of a particular relationship. He 
addressed the relationship between discourse markers and 
rhetorical relations, and, more generally, the signaling of 
rhetorical relations. He along with William C. Mann (Rhetorical 
Structure Theory: Looking Back and Moving Ahead) [5] 
reviewed some of the discussions about RST, especially 
addressing issues of the reliability of analyses and 
psychological validity, together with a discussion of the nature 
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of text relations. 

William C. Mann and Sandra A. Thompson (Rhetorical 
Structure Theory: Towards a functional theory of text 
organization) [3] extended the idea of what actually Rhetorical 
Structure Theory is. This paper marks the origin of RST It 
describes RST as descriptive theory of a major aspect of 
organization of natural text. This paper establishes a new 
definitional foundation for RST. 

Daniel Marcu and Abdessamad Echihabi( An 
Unsupervised Approach to Recognizing Discourse 
Relations) [9] present an unsupervised approach to recognize 
discourse relations of contrast, explanation-evidence, 
condition and elaboration that hold between arbitrary spans 
of texts. 

Sumit Das, Anupam Basu, Sudeshna Sarkar (Discourse 
Marker Generation and Syntactic Aggregation in Bengali 
Text Generation) [2], proposes the idea about the prevalent 
syntactic aggregation constructs in Bengali. The paper 
presented a rule based approach towards generating Bengali 
compound sentences using the identified constructs. 

III. The Proposed System 

We are only considering compound sentences. There are 
many types of multi-nuclear RST relations, but in this paper 
we are only dealing with CONTRAST, SEQUENCE and PAR- 
ALLEL. In Bengali Language some of the Discourse markers 

are: - 3JT^ : ^#T3 = ^£ comma (,), etc. 

We have collected many Bengali compound sentences 
and divided them manually into 2 different simple sentences. 
Suppose if we take a Bengali sentence like:- "WTSf^lRjfts 
5j^3T% <TE" ^31% [10]- 

(rlau.se unt; 1 
type: composite 
rhetoric_re lat ion : 

discourse marker: ^(H 

ciau5*_begia:# 

prediea te_bt £in : tt 

theme— love 

a*p?cr=$imple 
polarity -positive 

predicate _<-ml:-~ 

a rgument_t*g 

ar« nanif-kar 

argtinieDt_end;# 
argument begia:# 

a na ui«=ko1 ha i 



This compound sentence can be divided into 2 simple sen- 
tences like:- Simple sentence 1- :: WTWlf^s ^5" 

I " Simple sentence 2 - : : dril4lR)t'.S ^33T^lW 
•*sMlhfl I " The discourse marker is '"'^3TW" 

Figure 2 shows the semantic representation of the 
compound sentence[3], where the clause count is given as 2, 
the type of the sentence is represented as composite, the 
discourse marker is identified as "isI^sT and rhetoric relation 
is given as question mark implying the objective of our paper, 
which is to identify the RST relation. We have designed 
algorithms for RST relations CONTRAST, SEQUENCE and 
PARALLEL. 

• The algorithm for CONTRAST: - 

AJgorithni CONTRAST(3unple_3entence' . 3nnple_3entenK2. 
discoiir3e_marker) 

J 

H(dis corns s_mark£i= I 1^<y I 1fdt3 corns e_marka= ! TFT3'l|discoui3 

s_markst="B^3"lidt3cour3e_marka = "ia^ 1 ) then 

RST= COMHAST " 

Eh siifdis corns s_marka= II on3-:iii3 course_niarker=' V'Udisrarors e_ 

niykei= : "3^V" dis corns ;_rnyker = "TlfTO" ) thai 

Begin 

ii(pol=ffif>"(3ioiplj_3 aitsncel)!=polarttv{3iiuplj_; entaic;2 }} then 
RST = CONTRAST 

Else if ({pola^"(siinple_3entencel) = polarity 
(3unple_saiteiice2)) £&((verfc (siiiiple_3entence2 ) = anonym 
fvarb (siniple_s entence !)>})} then 

RST = CONTRAST , 

End if 
Return RST 

} 



argument eud:# 

clau«_eud:s 
wntence_end:8 
sentence tte«in:P 

ClaU4t'_tM'£LQ:*i 

predic at e_ begin :H 

verbid M 

lhen»=bare 
tfnse=present 
aspect— simple 
polarity _ aegativ# 
p]#dirate_PTid:* 
a rfitinienl begin: P 

arg_uauif _ liar 
n-roor=^^J3l^ 
ar?unienl_end;tt 
a rguni«ot_b*gin; tt 

arg_name=koibai 

n-roDt=tf15?-^Tf$> 

arguin«Bt_eud:ti 
clause end * 
Mntente_end:ft 



Figure 2: After Syntactic Aggregation semantic representation of compound sentence[2] [4]. 
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Let us take some examples for Contrast:- 

Example^-^WRlW W WW[ P^^IH^ 
fW ^W|% W WWf Wl-W *iW3F "[10]- Here 
the discourse marker is "f^F and according to our 
algorithm, if the discourse marker is RST is Contrast. 
Example2:- :: 0*T&WfflW ^Tf WW WW 3I|pra 
4fcslHl 3WIW3 eK<H ^TTi ,: [10]- Here the discourse 
marker is "'^T|W T \ then we need to check the semantic 

structure of the simple sentences, here the polarities of the 
two sentences are different, then RST is Contrast. 
Example 3:- :: dlil4lRj[,B 5yWR]W ^5" ^WTW ^ \3WRlW 
Qis ^^I'iSI |" [10]-Here the discourse marker is " : 3TH" then 
we check the semantic structure of the simple sentences, 
here the polarities of the two sentences are same but the 
verb 'VaHjiisd" in the simple sentence 2 is an antonym of the 
verb in the simple sentence 1, according to our 

algorithm RST is Contrast. 
• Algorithm for Sequence:- 



• Algorithm for function PRIORITYQ :- 



Let us take some example for Sequence :- Example 1:- 

"T^^nPraT^f^T ffl^r www wTcsfprt^rr 

[10] - Here we can see that the discourse marker is 
1L: q^R-a^r : an( l we can see tne polarity of the t wo 
simple sentences are same. So, according to our algorithm 
RST is Sequence. 

Example2:- :: ^5f WW f^T&WT^W WIW Wt<H WT 
WSWRlW WtW [10] -Here the two simple 

sentences are in different tense. So according to our algo- 
rithm RST is Sequence. 

Example 3:- :: T?W HW^l W^^WT^WFiW Wlte Pn?T 

^nffii ^f^wrpr TmnSw wte wi wwua wf 

[1 1] - Here the two simple sentences implies about different 
time in a single day. We have defined a function PRIORITY( 
) where we have divided a single day into 8 different spans 
and given each of them different priority values. Since the 
priority value of Wl^C^Tf is less than pjfot^ l ^ so RST 
is Sequence. 

Example 4:- 5^ mstfw T|W PiPrFT W*TFR -CW 

WW 5 spq WS" Fsl&f [10] - Here the verb CWItTM gets 

changed to OipTET when the two simple sentences are 

joined to get the composite sentence and thus gives us a 
non-finite form of a verb and it's non-repetitive and thus RST 
is Sequence. 

• Algorithm for PARALLEL RST:- 

Al?ofimm PARALLEL( 3imple_senten::ej. simple_senten::e] : 
dia cours e_marker) 

{ 

If {(simple_sentencel(Ke} = 3miple_senten;e2(Ke)) &&■ (tense 
(sunple_sentencel) = tense (3imple_3entence2))) liien 
SST = PAA4LLEL 

If ( verb (sunple_3entencel) = non- finite form of a verb) && ( 
verb (simple_3entencel) = repetitive form of a verb)) then 

sst = parallel 

Return RST 

j 

Let us take some example for Parallel: - 
Example 1:- -^m ^ t^33# WIWW W CkT5 : f^5T 
^TPT PftET W%Ri' : [10] -Here the actor ("Ke") in both 
the sentences are same and both the sentences are in same 
tense. So RST is Parallel. 

Examples- :: TRW TOjs TOjs-aBT WlWO^r [10]- 

Here the verb : : =H lb V" is me continuous form of the verb 
"=TTU^ an d gets changed into a repetitive form in the 
composite sentence lb t'-S "HM'S" and also it is non-finite 
form of the verb. So RST is Parallel. 

IV. Result and Analysis 

We have collected 1000 compound or composite sen- 
tences from short stories "KHIRER PUTUL" and "CHADER 
PAHAR", out of these 1000 sentences, we have randomly 



AlErcrithm PRIORITY ( Kakhana ) 

{ 

if (Kakhana = 'TjaTSI^rO thai Kskhmajiriority = 
Dse if (Kakhana = "H+HH-IO thai Kakhana jmority = 1 
Has if (Kakhana = "■^ifltiiO men Kakhana_priortty = 2 
Else if (Kakhana = "Rt^Pit^i ^ then Kakhana jjrisritv = 3 
Dse if (Kakhana = "WIFTTO men Kakhms_pnoftty = 4 
Else if (Kakhana = "sfKesUllO then Kakhansjxriority = 5 Dse 
if (Kakhana = "^OWITa"*) men fakhana_priority = 6 
Ds e if (Kakhana = "^[aflflie' 7 ) men Kal&ana jmority = 7 
Return Kaldiana_prioritv 
} 



Algorithm SEQUESOE( 3imple_3 entente .3imple_3enten;e2. 
dis corns e_marker) 

{ 

If ((dis cours ejnarker = " : <rWI*r5r T ) && (polarity 
(siniple_3entencel) = polaritv (simple_3entence2))) men 
RST = SEQUENCE 

Else If (((tense (simple_sentencel) — Past tense) (tense 
(3imple_3entence2) = Present tense)) ((tense 
(siniple_senten;er; == Present tense) (tense 
(simple_sentence2) = Future tense))) then 
RST = SEQUENCE 

PI = PRIORm r (Kakhana) 
P2 = PRIORITY (Kakhana) 
If (PI -= P2) men 

RST=SEQUE\VE 

If (( verb (simple_sentenc:ei) = non-finite form of a verb) && ( 
verb (sanple_sentencel) = non -repetitive form of a verb)) men 
RST= SEQUENCE 
F.etumRST 

} 
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taken 600 sentences as our training data set and the rest 400 
sentences as our testing data set. 

With the training data set, we have manually categorized 
sentences into Contrast, Sequence and Parallel depending 
on the definition of these RST relations, then we have split 
the composite sentences into 2 simple sentences manually 
and then create the semantic structure of each of the 
sentences, after which we manually identify the Discourse 
marker from each of the sentences. Then we categorized the 
sentences into Contrast, Sequence and Parallel, we have 
developed some rules for each of the RST relation from their 
semantic representation. These rules were then formulated 
into an algorithm for each of the RST relations. Then we have 
created a system with the help of our algorithm. 

The testing data set i.e. the remaining 400 sentences are 
first manually categorized into Contrast, Sequence and Parallel 
depending on the definition of these RST relations and then 
to check the accuracy of our system, we put the semantic 
representation of these 400 sentences into our system. 

ArciiTRCv of llir Sv«t»m 
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Figure. 3:- Line Chart to show the accuracy of our system 

The figure 3 shows the accuracy of the system. The 
accuracy is measured on the testing data set. The horizontal 
axis represents the Testing Data set, i.e. the number of 
sentences tested and the vertical axis represents the number 
of correct identification of RST relation by our system. We 
can see that for the first 100 sentences, 91 sentences were 
identified correctly by our system and so on. When all the 
400 sentences are tested, 383 RST Relations are correctly 
identified by our system. 

Conclusion 

In this paper we have discussed about the identification 
of the Rhetorical Structure Theory Relation from Discourse 
Marker in Bengali Language Understanding. We have de- 
rived algorithms for CONTRAST, SEQUENCE and PARAL- 
LEL RST relation and explained them with few examples. We 
have given user based evaluation to validate our approach. 
We have built a small system and a small corpus to check our 
result. 



Future Works 

There are many more RST relations whose algorithms can 
be derived and then a system can be built using the functions 
of all the RST relations. 
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